Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parallel generate, speed up 10% #1001

Merged
merged 6 commits into from
Apr 3, 2024
Merged

Conversation

xusd320
Copy link
Contributor

@xusd320 xusd320 commented Mar 29, 2024

Close #976

改动点:

  1. generate 阶段将 entry chunk 提前到和 normal chunk 并行执行。entry 中如果需要记录子 chunk 的 hash 的话,先用 nanoid 生成一个 8 位的随机 id 作为 placeholder,在所有 normal chunk generate 成后,再将 placeholder 替换为真实文件 hash ;
  2. 添加 chunk hash replace 的 e2e 用例;
  3. thread_pool 暴漏 rayon 的 join 方法,用于并行执行两个 op;
  4. ChunkPot::split_modules 没必要返回 result 类型,顺手改了;

验证性能提升:
yuyanAssets 提升 10%

当前分支:
image

master:
image

Summary by CodeRabbit

  • New Features

    • Added functionality to generate chunk files with hashed names.
    • Introduced the ability to set the mode to "production" and enable hashing for generated chunks.
    • Added a new public function to the thread pool module for concurrent execution.
    • Validated correct hash value replacement in build results.
  • Refactor

    • Refactored chunk file generation logic.
    • Reorganized chunk file processing and replacement logic.
  • Documentation

    • Updated mako.config.json to include new configuration options.
  • Style

    • Set font sizes in CSS files for better styling.

Copy link
Contributor

coderabbitai bot commented Mar 29, 2024

Walkthrough

The changes introduce new dependencies, refactor chunk file generation logic, enhance file processing, and add functionality for concurrent operations. Configuration options for production mode and chunk hashing are now available.

Changes

File Summary
crates/mako/Cargo.toml Added nanoid dependency version 0.4.0.
crates/mako/src/chunk_pot.rs & util.rs Publicized stylesheet field, adjusted return types, and made certain items public. Updated file_content_hash to truncate hash to a specified length.
crates/mako/src/generate_chunks.rs Refactored chunk file generation, introduced hashing, and reorganized processing logic.
crates/mako/src/thread_pool.rs Added join function for concurrent execution of closures.
e2e/fixtures/generate.chunks-hash-replace/expect.js New file for validating hash value replacements.
e2e/fixtures/generate.chunks-hash-replace/mako.config.json Introduces production mode and chunk hashing configuration.
e2e/fixtures/generate.chunks-hash-replace/src/*.css/.ts Added new CSS and TypeScript files with styling and module imports.

Poem

A rabbit hops, code changes unfold,
Dependencies new, logic bold.
Hashes truncate, threads unite,
Files styled, in code's dance, they write. 🐇✨

Note

Pull Request Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://coderabbit.ai


Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d1fdb1a and 811c976.
Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (11)
  • crates/mako/Cargo.toml (1 hunks)
  • crates/mako/src/chunk_pot.rs (7 hunks)
  • crates/mako/src/chunk_pot/util.rs (1 hunks)
  • crates/mako/src/generate_chunks.rs (2 hunks)
  • crates/mako/src/thread_pool.rs (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/expect.js (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/mako.config.json (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/index.css (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/index.ts (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/lazy.css (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/lazy.ts (1 hunks)
Files skipped from review due to trivial changes (1)
  • e2e/fixtures/generate.chunks-hash-replace/src/index.css
Additional Context Used
Additional comments not posted (14)
e2e/fixtures/generate.chunks-hash-replace/src/lazy.ts (1)

1-2: The changes in lazy.ts are straightforward and correct. It properly imports a CSS file and exports a string value.

e2e/fixtures/generate.chunks-hash-replace/mako.config.json (1)

1-4: The mako.config.json file is correctly formatted and the configuration settings align with the PR's objectives of optimizing the build process.

e2e/fixtures/generate.chunks-hash-replace/src/lazy.css (1)

1-5: The CSS changes in lazy.css are syntactically correct and follow standard CSS practices.

e2e/fixtures/generate.chunks-hash-replace/src/index.ts (1)

1-3: The changes in index.ts are correct, including the dynamic import of the lazy module, which aligns with modern JavaScript practices for lazy-loading.

e2e/fixtures/generate.chunks-hash-replace/expect.js (1)

1-18: The assertions in expect.js are correctly implemented to test the chunk hash replacement functionality, aligning with the PR's objectives of ensuring the integrity of the new parallel generation mechanism.

crates/mako/src/thread_pool.rs (1)

29-39: The addition of the join function in thread_pool.rs is correctly implemented to facilitate parallel execution of operations using the Rayon thread pool, adhering to Rust's concurrency best practices.

crates/mako/Cargo.toml (1)

21-21: The addition of the nanoid dependency with version 0.4.0 in Cargo.toml is correctly specified and aligns with the PR's requirements for generating temporary placeholders for chunk hashes.

crates/mako/src/chunk_pot.rs (2)

28-28: Making the stylesheet field public in the ChunkPot struct aligns with the PR's objectives of enhancing chunk processing.


186-200: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [133-197]

The adjustments to return types and making the CssModules struct and get_css_chunk_filename function public are correctly implemented, facilitating the new functionality introduced in the PR.

crates/mako/src/chunk_pot/util.rs (1)

307-313: The introduction of CHUNK_FILE_NAME_HASH_LENGTH to standardize hash lengths across the application is a good practice. However, it's crucial to ensure that the chosen length of 8 characters strikes a balance between uniqueness and brevity to minimize the risk of hash collisions. Consider analyzing the potential for collisions with the expected volume of files and adjusting the length if necessary.

crates/mako/src/generate_chunks.rs (4)

11-11: The use of nanoid for generating placeholders and the introduction of ChunksHashPlaceholder and ChunksHashReplacer are well-considered choices for managing the complexity of parallel chunk generation. Ensure to conduct performance testing to assess the impact of nanoid generation and analyze edge cases, such as the unlikely but possible generation of identical placeholders for different chunks.

Also applies to: 55-56


64-74: The parallel generation of entry and normal chunk files using thread_pool::join is a significant improvement in the build process. Ensure thorough testing is conducted to identify any potential data races or concurrency issues, especially considering the shared state that might be accessed by these parallel tasks.


150-156: The handling of CSS filenames and placeholders within the chunk generation process is well-integrated. Ensure that the naming and replacement logic for CSS files is reviewed for consistency and efficiency, particularly considering the unique requirements and conventions for CSS files compared to JS chunks.

Also applies to: 158-161


285-285: The hash_file_name function is crucial for generating hashed filenames. Ensure that edge cases, such as files without extensions or with multiple dots in their names, are adequately tested to guarantee the function's robustness in all scenarios.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 811c976 and d8cd5bc.
Files selected for processing (1)
  • crates/mako/src/generate_chunks.rs (2 hunks)
Additional Context Used
Additional comments not posted (5)
crates/mako/src/generate_chunks.rs (5)

11-11: The addition of nanoid::nanoid aligns with the PR's objective to use temporary placeholders for chunk hashes. This is a lightweight and efficient way to generate unique IDs, suitable for the intended use case.


55-56: The introduction of ChunksHashPlaceholder and ChunksHashReplacer maps is a good design choice. It clearly separates the concerns of holding placeholders for chunk hashes and their eventual replacement values, contributing to code readability and maintainability.


72-114: The parallel generation of entry and normal chunk files using thread_pool::join is a significant improvement. It leverages concurrency to enhance performance, directly addressing the PR's objective. However, ensure that the thread pool is appropriately sized to avoid potential overhead from too many threads competing for resources.


81-114: The logic for replacing chunk hash placeholders with actual hashes is well-implemented. It's crucial, however, to ensure that the replacement operation does not introduce any unintended modifications to the chunk files. Consider adding more detailed tests to cover edge cases, such as placeholders appearing in unexpected parts of the chunk content.


255-298: The replace_chunks_placeholder function is crucial for the new chunk generation logic. It replaces placeholders in chunk files with actual hashes. The error handling here is specific and informative, which is good practice. However, consider the performance implications of iterating over the content for each placeholder. For large files or a high number of placeholders, this could become a bottleneck. Investigating more efficient string replacement techniques or ensuring that the number of placeholders is kept minimal could mitigate potential performance issues.

@xusd320 xusd320 changed the title perf: parallel generate, speed up 10% [WIP] perf: parallel generate, speed up 10% Mar 29, 2024
@xusd320 xusd320 force-pushed the perf/parallel-generate branch from d8cd5bc to d6a0632 Compare March 29, 2024 11:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d1fdb1a and d6a0632.
Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (11)
  • crates/mako/Cargo.toml (1 hunks)
  • crates/mako/src/chunk_pot.rs (7 hunks)
  • crates/mako/src/chunk_pot/util.rs (1 hunks)
  • crates/mako/src/generate_chunks.rs (2 hunks)
  • crates/mako/src/thread_pool.rs (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/expect.js (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/mako.config.json (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/index.css (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/index.ts (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/lazy.css (1 hunks)
  • e2e/fixtures/generate.chunks-hash-replace/src/lazy.ts (1 hunks)
Files skipped from review as they are similar to previous changes (10)
  • crates/mako/Cargo.toml
  • crates/mako/src/chunk_pot.rs
  • crates/mako/src/chunk_pot/util.rs
  • crates/mako/src/thread_pool.rs
  • e2e/fixtures/generate.chunks-hash-replace/expect.js
  • e2e/fixtures/generate.chunks-hash-replace/mako.config.json
  • e2e/fixtures/generate.chunks-hash-replace/src/index.css
  • e2e/fixtures/generate.chunks-hash-replace/src/index.ts
  • e2e/fixtures/generate.chunks-hash-replace/src/lazy.css
  • e2e/fixtures/generate.chunks-hash-replace/src/lazy.ts
Additional Context Used
Additional comments not posted (5)
crates/mako/src/generate_chunks.rs (5)

11-11: The addition of nanoid as a dependency for generating unique placeholders is a good choice due to its efficiency and minimal collision risk. Ensure that the version 0.4.0 is compatible with other dependencies and the project's overall ecosystem.


55-56: Introducing ChunksHashPlaceholder and ChunksHashReplacer as HashMap types is a clear and efficient way to manage the mapping between chunk IDs and their respective placeholders or actual hash values. This approach supports the parallel generation and replacement logic effectively.


72-115: The parallel generation of entry and normal chunk files using thread_pool::join is a significant improvement. It aligns with the PR's objective to enhance performance through concurrency. However, ensure that the thread pool is adequately sized to prevent potential bottlenecks or excessive resource usage.


81-115: The logic for replacing placeholders in entry chunk files with actual hash values is well-implemented. It's crucial to ensure that this replacement process is thoroughly tested, especially in edge cases where placeholders might not be found or replaced correctly.


256-299: The replace_chunks_placeholder function is crucial for the new parallel generation logic, ensuring that placeholders in chunk files are replaced with actual hash values. It's important to handle errors gracefully and provide clear messages for debugging. Consider adding more detailed logging or metrics around this process to help identify and troubleshoot issues in production environments.

@xusd320 xusd320 changed the title [WIP] perf: parallel generate, speed up 10% perf: parallel generate, speed up 10% Mar 29, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d6a0632 and d054e50.
Files selected for processing (1)
  • crates/mako/src/generate_chunks.rs (2 hunks)
Additional Context Used
Additional comments not posted (5)
crates/mako/src/generate_chunks.rs (5)

11-11: Importing nanoid aligns with the PR objectives for efficient chunk hashing. Ensure that the nanoid version (v0.4.0) is compatible with the project's other dependencies to avoid conflicts.


55-56: The introduction of ChunksHashPlaceholder and ChunksHashReplacer types is a good practice for managing chunk hash placeholders and their replacements. This enhances code readability and maintainability.


72-115: The parallel generation of entry and normal chunk files using thread_pool::join is a significant improvement. It aligns with the PR's goal of speeding up the build process. However, ensure that the error handling within the parallel execution paths is robust and that any potential panics are gracefully handled.


81-115: The logic for replacing chunk hash placeholders in entry chunk files is well-implemented. However, consider adding more detailed logging or metrics around the placeholder replacement process to aid in debugging and performance monitoring.


256-299: The replace_chunks_placeholder function is crucial for the dynamic replacement of chunk hash placeholders. It's well-implemented, but consider adding error logging for each specific failure case to aid in troubleshooting. Additionally, ensure that the performance impact of iterating over chunk files and performing replacements is minimal, especially for large builds.

@xusd320 xusd320 changed the title perf: parallel generate, speed up 10% WIP perf: parallel generate, speed up 10% Mar 29, 2024
@xusd320 xusd320 changed the title WIP perf: parallel generate, speed up 10% perf: parallel generate, speed up 10% Apr 1, 2024
Copy link
Member

@stormslowly stormslowly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以记一个 action, 看下 split_modules 在 entry chunk 中重复执行的一次开销大不大。
如果需要优化的话 generate_chunk_files 里面实现逻辑调整成
chunk -> chunk_pot -> chunk_files 这样的一个流程
在 chunk_pot 阶段,配合 module graph 有能算出各个 entry 的 js_map 和 css_map

@sorrycc sorrycc merged commit b106e28 into master Apr 3, 2024
7 of 8 checks passed
@delete-merged-branch delete-merged-branch bot deleted the perf/parallel-generate branch April 3, 2024 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] chunk to ast 的并行优化方案
3 participants